PM2.5 concentration prediction based on EEMD-ALSTM

The concentration prediction of PM2.5 plays a vital role in controlling the air and improving the environment. This paper proposes a prediction model (namely EEMD-ALSTM) based on Ensemble Empirical Mode Decomposition (EEMD), Attention Mechanism and Long Short-Term Memory network (LSTM). Through the combination of decomposition and LSTM, attention mechanism is introduced to realize the prediction of PM2.5 concentration. The advantage of EEMD-ALSTM model is that it decomposes and combines the original data using the method of ensemble empirical mode decomposition, reduces the high nonlinearity of the original data, and Specially reintroduction the attention mechanism, which enhances the extraction and retention of data features by the model. Through experimental comparison, it was found that the EEMD-ALSTM model reduced its MAE and RMSE by about 15% while maintaining the same R2 correlation coefficient, and the stability of the model in the prediction process was also improved significantly.

transmitted to the next level in the network 13,15 .Therefore, the addition of attention has been proven to have a significant improvement in the accuracy of prediction results in many prediction models.
The process of data is also an important aspect that affects the quality of machine learning results.The method of decomposition and combination is mainly used for data signal processing.In natural language processing tasks, most of the data is highly non-linear, so how to reduce the dimensionality of the highly non-linear data information is one of the important issues in natural language processing tasks.Empirical Mode Decomposition (EMD) proposed by Huang et al. is a data processing method that decomposes signals into different time and frequency domains, and then reduces nonlinearity in the signal 16,17 .However, due to the modal aliasing problem of EMD, some scholars try to make predictions with EEMD to process data 18,19 .In addition, Yang et al. used decomposition methods such as EEMD to reduce the nonlinearity of the data by using signal decomposition to deepen their understanding of the data 20 .Zhu et al. demonstrated through experiments that the use of attention mechanisms can enhance the important role of predictive models in extracting data features and deepening understanding of data 21 .The above proves the feasibility of EEMD and attention mechanism in deeply understanding data from two aspects: input and extraction.
At present, the numerous models mentioned above extremely rely on multiple influencing factors for prediction 20,22 , which predict various factors that affect air quality or PM 2.5 .The advantage of these models is that these prediction methods can jointly influence the direction of prediction based on these different influencing factors, but this is also a disadvantage of this model, these large and diverse experimental data require a lot of equipment to collect.To address this issue, this article proposes to use a single data in the dataset for longterm prediction of PM 2.5 .The method of EEMD and attention mechanism first decomposes the original signal to reduce the dimensionality of the original data features, and then adds attention mechanism to enhance the models method of extracting and focusing on data features for prediction.

Materials
This article takes changes of the PM 2.5 in Beijing as an example, using a dataset from the machine learning library of the University of California.The dataset includes 43,824 h of air data from January 31, 2010 to December 31, 2014, which includes real-time PM 2.5 concentration, dew point, temperature, wind direction, air pressure, and other items per hour of Beijing.This article mainly uses real-time data of PM 2.5 for experiments.The data can be found on the website: https:// archi ve.ics.uci.edu/ datas et/ 381/ beiji ng+ pm2+5+ data.
Among them, the average concentration of PM 2.5 is 71.099 μg/m 3 , the minimum value is 2.000 μg/m 3 , and the maximum value reaches 882.000 μg/m 3 .For missing data, they are deleted, and then the remaining data are going to be reordered by date.Finally, the processed data will write into a new table.

Methods
In this section, we will introduce the details of EEMD, attention mechanism and LSTM.And based on the above three models, a EEMD-ALSTM model is constructed.The goal of the model is to predict the actual PM 2.5 concentration after 24 h.Also, it will predict the sub signals generated in the model simultaneously.Therefore, it is a class multi-step prediction model.
(1) EEMD The EEMD method is a locally adaptive time series analysis technique developed in recent years, which is a new nonlinear and non-stationary time series analysis method based on the empirical model decomposition (EMD) method 23 .The essence of EMD is to decompose nonlinear signals into intrinsic mode signals at different time scales using both time-domain and frequency-domain processing methods.
The decomposed signal has local signal characteristics of the original signal at the same time scale 24 .That is to decompose a signal into multiple signals containing the intrinsic modes of the original signal.However, the EMD method not only incorporates features from other time scales during the decomposition process, resulting in modal aliasing, but also lacks the standard to stop iteration during the iteration process 25 .The advantage of EMD is its strong adaptability, which can decompose signals with different cut-off frequencies and bandwidths based on the original signal.However, it also expresses the properties of the original signal in each component.The disadvantage is that different signals at the same time scale have the phenomenon of similar scale signals and endpoint effects.The EEMD just makes up for these shortcomings.
The model starts with adding noise and increases the number of zeros in the original signal to assist in signal processing 23 .It first add positive Gaussian white noise to the original signal, treat the added signal as a whole, and then perform EMD decomposition on the signal to obtain each modal component IMFs.Then, repeat the above steps to obtain the IMFs component after each EMD decomposition.As a note, whether the decomposed IMF components belong to pure noise or physically significant components of the original sequence can be determined through significance testing.Its properties can be determined by energy spectral density period distribution of each IMF component and then IMFs are selected you want.
The specific process of EEMD decomposition is shown in Fig. 1. (2) LSTM RNN is a type of neural network model in the field of deep learning, which consists of a series of recurrent self-connecting structures 8,26 , as shown in Fig. 2. Its main drawback is the long-range dependence problem caused by gradient explosion and gradient disappearance 27,28 .Moreover, there will be a large amount of continuous data in the calculation process of backpropagation algorithm over time.And if the time interval is too small, the gradient will disappear, and if the interval is too large, the gradient explosion will occur, leading to the instability of the entire neural network.The LSTM is an improved neural network based on RNN that have insufficient ability to solve long-term dependency problems, as shown in Fig. 3.   www.nature.com/scientificreports/data processing ability when the models computing power is limited 29 .And in LSTM, attention mechanism weighting based on different features in temporal information can improve the correlation of neural network predictions.Figure 4 illustrates the essence of attention mechanism.It adds a linear transformation node inside the neural network, pays attention to the data features of the input neural network, and then assigns different weights based on the distribution of attention.Its formula as where x, q, α i and x i represent the input sequence, the feature, the distribution of attention and the number i information in the sequence respectively.(4) ALSTM The model enhances the LSTM networks ability to process sequential data by introducing attention mechanisms.In ALSTM, a memory system is usually designed that can store and update information, and uses attention mechanisms to focus on important parts of data in real time.Compared to traditional LSTM, the main structural difference of ALSTM is that it adds an attention layer in the network.This additional layer enables the model to comprehensively consider and focus on key time steps in the input sequence before making predictions.The specific process is shown in the Fig. 5. ( 5) EEMD-ALSTM model The model is constructed based on the above three basic models as shown in Fig. 6.
It consists of three main components, namely EEMD for temporal data, LSTM neural network and attention mechanism.On the left side of Fig. 6, the downloaded dataset is organized into a data table that is only related to PM 2.5 concentration and then transmitted to the network.During the initial processing of these data, the data containing PM ing that in the prediction process, in addition to using the LSTM model for overall prediction, attention mechanisms are also used to enhance feature vectors.Here, we will connect the hidden layer h t of LSTM with the attention layer x t.This increase in attention can make the prediction model more closely connect the data information and changes of the upper and lower layers.Finally, the predictions for each signal are combined to form the final output.( 6) Evaluating Indicator In order to evaluate and compare the predictive models involved in this experiment, the R 2 coefficient of determination (R-Square), root mean square error (RMSE) and mean absolute error (MAE) were used to evaluate the predictive accuracy of the models.The calculation formula for evaluation indicators is as follows: (1) att(X, q) = N i=1 α i x i www.nature.com/scientificreports/SVR is a commonly used regression problem model used to solve time series problems [30][31][32] .Due to the input sequence format requirements of the SVR model, we changed the format of the time series data to one-dimensional and used "RBF" as the kernel function to call the SVM toolbox from sklearn for experiments.Therefore, the horizontal axis of the predicted results using support vector regression is displayed as "Time (Hour)".The test set results obtained from the experiment using the above dataset are shown in the Fig. 7.The advantage of SVR model lies in its small size, ability to handle high-dimensional data, and ability to handle linear and nonlinear problems.However, the disadvantage of the SVR model is that it does not support parallel computing, so it may encounter performance bottlenecks when processing.large-scale data.And SVR belongs to machine learning models.For large-scale datasets, SVR models have high computational complexity and long training time; During the training process, it is sensitive to parameter adjustment and outliers in the data, as shown in Fig. 7.When approaching the peak of PM 2.5 concentration, the predicted curve cannot fully fit the true curve.Therefore, it is necessary to choose different kernel functions and regularization to prevent.
The first step in establishing an LSTM model is to determine the input layer, intermediate layer, and output layer.Among them, the input layer receives the raw data in the format of 32 neurons and input vector [24:1], and two LSTM layers with 32 neurons are set in the middle to extract features and learn.Finally, a fully connected layer is used as the output layer to predict the learned features.Compared to regular RNNs, LSTM adds an additional memory module that can better learn and store long-term information.The test set results obtained from the experiment using LSTM prediction model and the aforementioned dataset are shown in Fig. 7.
In the Fig. 8, we can clearly see that the LSTM model fits well in predicting the time series data of PM 2.5 concentration, but there are still shortcomings.It is evident that the low valley of PM 2.5 concentration has a negative value, but theoretical knowledge shows that LSTM has a certain dependence when extracting features from long-term data.If the data volume is insufficient or the quality is not high, it may affect the performance of the model.Therefore, in the LSTM model, overfitting occurred, causing the prediction curve to maintain the prediction trend of the previous state without timely modification.
This paper developed an attention mechanism-based model ALSTM model so as to have a more intuitive understanding of its data requirements.The model enhances the key information of data features in the network by fully connecting the output of LSTM with the attention module, thereby improving the accuracy of PM 2.5 concentration prediction.The prediction results are shown in Fig. 9.It can be clearly seen that after adding attention, the prediction model has a good fit at both peak and valley values, but there is still a gap from accurately predicting the trend of PM 2.5 concentration.Furthermore, it can be observed in Fig. 9 that all of its valleys cannot be well predicted.However, for the overall prediction curve, the addition of attention mechanism has a promoting effect on the entire network model.
To make the comparative experiment more convincing, a set of experimental groups based on EEMD decomposition were used in the experiment.The decomposed sub sequences and instantaneous frequency 33  The EEMD-ALSTM model also uses set empirical mode decomposition to process the data, and then predicts the decomposed modal signals using an attention-based LSTM model.Finally, as depicted in Fig. 12, the predicted signals are combined to form an output, and the prediction results of the EEMD-ALSTM model.
This work presents the predictions in the form of a table and evaluates the model using three basic indicators of predictive performance to have a better demonstration of the predictive performance of different models.The evaluation results are shown in Table 1.
The Table 1 could sensitively display that there are significant differences in the observed evaluation indicators under different model predictions.The R 2 of the EEMD-ALSTM model is the highest, indicating that the model has the best regression stability.When predicting the air concentration of PM 2.5 , the prediction results are also the most reliable.In the MAE and RMSE of this model are superior to other models, indicating that the EEMD-ALSTM model provides the most accurate prediction results when predicting PM 2.5 .In addition, based on the prediction results of ALSTM and EEMD-LSTM, the adding EEMD decomposition significantly improves the overall performance of the model in the experiment.Especially by reducing the MAE and RMSE by about 90%, the EEMD decomposition method has a significant effect in reducing the high nonlinearity of PM 2.5 data.According to the experimental results of the EEMD-ALSTM and EEMD-LSTM model, it was found that while maintaining similar R 2 correlation coefficients, the MSE and RMSE of the EEMD-ALSTM model decreased by about 15%, and the stability of the model in the prediction process was significantly improved.Above all, the increase in attention mechanism after simultaneously decomposing EEMD could improve the accuracy of the  www.nature.com/scientificreports/model in predicting PM 2.5 concentration.In other words, the increase in attention enhances the models ability to extract features from data information and preserve and transmit as many important features as possible.

Discussion
We plot and compare the predicted results of all models in this experiment using data from the test set, rather than comparing all PM 2.5 concentration data.There are two reasons for using this method for drawing.The first is the training set has already been used once during model training, but if when conducting experimental predictions, the training set data is used again for drawing, which cannot fully reflect the advantages and disadvantages of the model; The second reason is that if all the data is plotted, the difference between the predicted and true values and the overall fit of the image cannot be reflected in the image.Before using the data, the first processing of the data involves removing empty data points from the original dataset, making the original data a continuous time series.The advantage of this approach is that it adopts the dropout concept to prevent overfitting of the model.When cutting the entire data, use a 24-h PM 2.5 concentration as a spline data.In normal production and life, people often pay more attention to the air environment quality of the next day.Therefore, the data from the previous 24 h is used to predict the data from the next 1 h.www.nature.com/scientificreports/ In the comparison of the results between the LSTM and ALSTM model, we can find that the LSTM model has about 10% stronger error evaluation data than the ALSTM model.However, after EEMD decomposition and prediction, the ALSTM model is about one-third stronger than the LSTM models prediction data.This is because the network structure of the ALSTM model is complex, resulting in slightly worse performance than the LSTM model; However, after adding EEMD to decompose the data, the overall model data will reverse the LSTM model.This is because the changes in PM 2.5 content in the air are highly nonlinear and therefore it is necessary to reduce the nonlinearity of data.In this way, when using the model, the feature information in the data can be more accurately extracted, thus it easier to make accurate predictions (Supplementary Information).

Conclusions
Recently, many scholars in the field of environmental protection have become increasingly interested in predicting PM 2.5 concentrations.With the continuous improvement of urban air pollution prediction and management, many cities have established air quality monitoring stations.How to effectively utilize the data collected by these monitoring stations to improve urban air quality has become an important issue.To address this issue, we propose an LSTM based on EEMD and attention mechanism for PM 2.5 concentration prediction.
Summarizing the results of this work, we elicit the main conclusions as follows.
(1) For the sequence of PM 2.5 concentration with high nonlinearity, how to handle the highly nonlinear the sequence is inevitable.This work first uses the EEMD to process the sequence, in order to effectively improve its nonlinearity and form regular sequence in their respective modes.
(2) This article proposes using an attention mechanism model to optimize the prediction of LSTM, which has better performance compared to traditional machine learning models and separately predicted models.The reason is that the attention mechanism can more accurately extract the correlation of information in signals and assign greater weights with emphasis.
(3) This article conducted 24-h predictions for each model, and the prediction results showed that the EEMD-ALSTM model had better predictive performance.MAE, RMSE and R 2 were 0.6715 µg/m 3 , 0.9451 µg/m 3 and 0.9550 respectively.(4) However, when designing network parameters, we used fixed network parameters, which may not necessarily be the best choice for the current network model.The selection of network parameters has a significant impact on the performance and stability of the model.Therefore, in future research, we will consider introducing network parameter lookup methods to further improve the capabilities of the EEMD-ALSTM model.The network parameter search method can automatically adjust network parameters based on the characteristics of data and prediction needs to obtain the best prediction results.This will enable the model to better adapt to different data features and prediction tasks, thereby improving the accuracy and stability of prediction.
In summary, the EEMD-ALSTM prediction model based on EEMD, attention mechanism and LSTM to predict PM 2.5 concentration.The experimental results show that it performs well in terms of predictive ability with high accuracy and stability.Future research will further optimize the network parameters of the model to improve its predictive ability and provide more reliable decision-making basis for environmental management and public health.

Figure 10 .
Figure 10.IMFs and instantaneous frequency after raw data decomposition.

Table 1 .
The evaluation results of each model.